Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add integration tests for "pebble run" #497

Merged
merged 15 commits into from
Sep 23, 2024

Conversation

IronCore864
Copy link
Contributor

@IronCore864 IronCore864 commented Sep 4, 2024

A PoC to solve the Pebble integration test, see issue here.

Two issues I want to call out:

  1. It seems the test suite can't be used with build flags, it can't collect any tests: testing: warning: no tests to run.

Maybe this is possible but I haven't made it work yet, and I think this could be the reason: as per the Golang Doc, the build tag lists the conditions under which a file should be included in the package, and maybe this is the reason why it doesn't work with suite.

It means we can't do something like func (s *IntegrationSuite) TestXXX(c *C), but can only do func TestXXX(t *testing.T); which is OK I think, just something worth mentioning.

  1. In the PoC I built the binary in setup, then created some functions to help create layers, run pebble and return logs. The logs part is tricky:

The daemon is a long-running process and we can't wait for it to "finish". Here I used something like "if there are no new logs in the past second, kill the process and return". It doesn't feel ideal to me.

I had another draft where I simply passed a value for timeout into it, sleep, then kill.

Both worked, but I'm not sure if this is the best we can do here.

Copy link

@dimaqq dimaqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach overall makes sense, though I wonder: do build flags allow you to fudge the code under test somehow?

I can see how this allows running integration tests overall, but I'm a bit unclear about how this helps testing code that needs root or what not.

P.S. maybe get someone from Juju to review the go bits?

internals/testintegration/pebble_run_test.go Outdated Show resolved Hide resolved
internals/testintegration/utils.go Outdated Show resolved Hide resolved
internals/testintegration/utils.go Outdated Show resolved Hide resolved
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving comments per our discussion.

internals/testintegration/pebble_another_test.go Outdated Show resolved Hide resolved
internals/testintegration/pebble_run_test.go Outdated Show resolved Hide resolved
internals/testintegration/utils.go Outdated Show resolved Hide resolved
internals/testintegration/utils.go Outdated Show resolved Hide resolved
internals/testintegration/utils.go Outdated Show resolved Hide resolved
internals/testintegration/utils.go Outdated Show resolved Hide resolved
internals/testintegration/utils.go Outdated Show resolved Hide resolved
internals/testintegration/pebble_run_test.go Outdated Show resolved Hide resolved
tests/README.md Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/utils.go Outdated Show resolved Hide resolved
tests/utils.go Outdated Show resolved Hide resolved
tests/utils.go Outdated Show resolved Hide resolved
tests/utils.go Outdated Show resolved Hide resolved
tests/utils.go Outdated Show resolved Hide resolved
@IronCore864
Copy link
Contributor Author

IronCore864 commented Sep 11, 2024

Resolving all the above comments, and here is a list of test cases for pebble run:

Pebble Run Tests (run_test.go)

1 TestNormal

services:
    svc1:
        override: replace
        command: /bin/sh -c "touch svc1; sleep 1000"
        startup: enabled
    svc2:
        override: replace
        command: /bin/sh -c "touch svc2; sleep 1000"
        startup: enabled
  1. Start pebble.
  2. Pebble will start svc1 and svc2
  3. Check svc1 and svc2 are running: check files svc1 and svc2.

Need a helper function like waitForServices with a timeout to check files.

2 TestCreateDirs

tmpDir := t.TempDir()
pebbleDir := filepath.Join(tmpDir, "PEBBLE_HOME")
  1. Start pebble with --create-dirs.
  2. Pebble will create dir tmpDir/PEBBLE_HOME
  3. Check dir tmpDir/PEBBLE_HOME exists.

No need for a helper function, use os.Stat.

3 TestHold

services:
    svc1:
        override: replace
        command: /bin/sh -c "touch /home/ubuntu/PEBBLE_HOME/svc1; sleep 1000"
        startup: enabled
  1. Start pebble with --hold.
  2. Pebble daemon starts.
  3. Wait for log "Started daemon." before issuing any check (the waitForLogs helper function is needed).
  4. Sleep a second before checking services (immediate check can't guarantee that svc1 is started shortly after the log "Started daemon.").
  5. Check that svc1 is not running: check file svc1 does not exist. No need for a helper, use os.Stat.

4 TestHttpPort

  1. Start pebble with --http=:4000.
  2. Pebble will start on port 4000
  3. Check port 4000 is being listened by Pebble.

Need a helper function like func isPortInUseByProcess(port string, processName string) bool {}.

5 TestVerbose

services:
    svc1:
        override: replace
        command: /bin/sh -c "cat /home/ubuntu/PEBBLE_HOME/layers/001-layer.yaml; sleep 1000"
        startup: enabled
  1. Start pebble with --verbose
  2. Check "services:", "svc1:", "override: replace", "startup: enabled" are in the logs, need the waitForLogs helper func.

6 TestArgs

services:
    svc1:
        override: replace
        command: /bin/sh
        startup: enabled
  1. Start pebble with --verbose --args svc1 -c "cat /home/ubuntu/PEBBLE_HOME/layers/001-layer.yaml; sleep 1000" (verbose is used so that I can test args, and I use the layers file for testing, have to be a bit innovative here to test args...)
  2. Same checks as the previous test case, waitForLogs.

7 TestIdentities

  1. Create a file named idents-add.yaml, maybe in pebbleDir:
identities:
    bob:
        access: admin
        local:
            user-id: 42
    alice:
        access: read
        local:
            user-id: 2000
  1. Start pebble with --identities.
  2. Run pebble identity bob command and check access: admin, user-id: 42 are in the output.
  3. Run pebble identity alice command and check access: read, user-id: 2000 are in the output.

Need a helper like runPebbleCmdAndCheckOutput.

Summary

  • waitForLogs is needed
  • a few more helpers are needed, see the above test cases.

Since these test cases are relatively simple to implement, I will go ahead and do them now. We can review and refactor later.

@IronCore864
Copy link
Contributor Author

IronCore864 commented Sep 12, 2024

I have restructured the files and added the tests mentioned above.

Note that I didn't strictly follow the "rule of 3" when creating those helper functions, some are only used twice, some even just once. The reason is that if some code weren't put in a separate function, the test functions would become much longer and harder to read. So, I kept them as they were. Instead of thinking of them as helper functions, think of them as a way to refactor the tests to improve readability, and if they don't fit future needs, we can refactor them later.

Todo:

  • Add a github actions workflow to run these integration tests.
  • Use build flags to orchestrate root tests.

@IronCore864 IronCore864 marked this pull request as ready for review September 12, 2024 14:51
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this -- I think it's heading in the right direction. Lots of comments, but they're basically about how we structure the helpers and make the tests more obvious. I think in general it's better to have fewer, more generic helpers. One tell-tale sign is that same of the names start to get long or specific, like isPortUsedByProcess (very specific to a particular test), writeIdentitiesFile (doesn't actually do anything specific to identities), or runPebbleCmdAndCheckOutput (often an "and" in a function name means you should split it). Happy to discuss any of these further on video if you want.

tests/README.md Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/run_test.go Outdated Show resolved Hide resolved
tests/run_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
@benhoyt
Copy link
Contributor

benhoyt commented Sep 15, 2024

The reason is that if some code weren't put in a separate function, the test functions would become much longer and harder to read. So, I kept them as they were. Instead of thinking of them as helper functions, think of them as a way to refactor the tests to improve readability.

I realise this is somewhat subjective, but I think mere length is okay. I definitely disagree about "harder to read" -- I think several of the helpers obscure the logic of the test and make it unclear what it's actually testing. I think there are some minor tweaks we can make to the structure and names to help with this, and I've argued my case in the comments above.

@benhoyt benhoyt changed the title test: integration test poc test: add integration tests for "pebble run" Sep 16, 2024
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I just ran the integration tests with the pebble daemon running in another window, and got this error:

$ go test -count=1 -tags=integration ./tests/
--- FAIL: TestHttpPort (3.05s)
    run_test.go:113: Error waiting for logs: timed out waiting for log: Started daemon

It seems to me the two instances should be completely independent, as the integration tests point to a temporary PEBBLE directory. Any ideas why this would fail?

@IronCore864
Copy link
Contributor Author

FWIW, I just ran the integration tests with the pebble daemon running in another window, and got this error:

$ go test -count=1 -tags=integration ./tests/
--- FAIL: TestHttpPort (3.05s)
    run_test.go:113: Error waiting for logs: timed out waiting for log: Started daemon

It seems to me the two instances should be completely independent, as the integration tests point to a temporary PEBBLE directory. Any ideas why this would fail?

I saw this comment at last after fixing all comments above, and after the fixes, I could not reproduce. Could you confirm?

@benhoyt
Copy link
Contributor

benhoyt commented Sep 18, 2024

Regarding the TestHttpPort failure when the daemon is running -- no, I can't reproduce now either. We'll call it fixed.

Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is much cleaner now. A few minor comments, and one comment about the nested goroutines in pebbleRun.

.github/workflows/integration-test.yml Outdated Show resolved Hide resolved
.github/workflows/integration-test.yml Outdated Show resolved Hide resolved
.github/workflows/integration-test.yml Outdated Show resolved Hide resolved
tests/README.md Show resolved Hide resolved
tests/README.md Show resolved Hide resolved
tests/run_test.go Outdated Show resolved Hide resolved
tests/run_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/run_test.go Outdated Show resolved Hide resolved
tests/run_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks!

One remaining thing I noticed after running the integration tests -- they don't seem to clean up after themselves very well. After running them, I get:

$ ps aux | grep sleep
ben       137333  0.0  0.0   2800  1536 pts/2    S    08:56   0:00 /bin/sh -c touch /tmp/TestStartupEnabledServices3374203583/001/svc1; sleep 1000
ben       137334  0.0  0.0   2800  1536 pts/2    S    08:56   0:00 /bin/sh -c touch /tmp/TestStartupEnabledServices3374203583/001/svc2; sleep 1000
ben       137337  0.0  0.0   8288  1920 pts/2    S    08:56   0:00 sleep 1000
ben       137338  0.0  0.0   8288  1920 pts/2    S    08:56   0:00 sleep 1000
ben       137373  0.0  0.0   2800  1536 pts/2    S    08:56   0:00 /bin/sh -c echo 'hello world'; sleep 1000
ben       137374  0.0  0.0   8288  1920 pts/2    S    08:56   0:00 sleep 1000
ben       137384  0.0  0.0   2800  1536 pts/2    S    08:56   0:00 /bin/sh -c echo 'hello world'; sleep 1000
ben       137385  0.0  0.0   8288  1920 pts/2    S    08:56   0:00 sleep 1000

Any idea why? Maybe we can do some logging to see why. I would have thought sending SIGINT should cause Pebble to stop the running services before exiting, but it's clearly not doing that (properly, at any rate).

One thing we could do to mitigate it (but not fix it, so we should still look into it) is changing the sleep 1000 to sleep 10 -- that's still plenty long enough for these tests, but at least they'll exit sooner themselves.

I'll also ask Harry to review this, as he originally created the issue.

Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Just a few nitpicks.

tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Outdated Show resolved Hide resolved
tests/main_test.go Show resolved Hide resolved
tests/main_test.go Show resolved Hide resolved
tests/main_test.go Show resolved Hide resolved
tests/main_test.go Show resolved Hide resolved
tests/main_test.go Show resolved Hide resolved
tests/run_test.go Show resolved Hide resolved
@benhoyt
Copy link
Contributor

benhoyt commented Sep 20, 2024

Agreed with all of Harry's comments (but a slightly different explicit approach than --no-build) -- thanks for the review.

@IronCore864
Copy link
Contributor Author

Reply to the above comments:

  • I have added some comments for all the tests and most of the helper functions.
  • A new flag -pebbleBin is added as Ben suggested, so that if set, it will be used for the integration tests rather than building one.
  • Services not stopped after SIGINT is sent to Pebble is mitigated by changing sleep 1000 to sleep 10, but it's not resolved.

Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@benhoyt
Copy link
Contributor

benhoyt commented Sep 20, 2024

Services not stopped after SIGINT is sent to Pebble is mitigated by changing sleep 1000 to sleep 10, but it's not resolved.

Can you please spend a bit of time looking into why is? I just don't want to paper over something if it's an actual bug.

@benhoyt
Copy link
Contributor

benhoyt commented Sep 22, 2024

I looked into this further, and debugged by printing out the Pebble/stderr logs (and running the tests with -v so they showed up). Then I enabled PEBBLE_DEBUG=1 in the subprocess, and this helped me see that when SIGTERM was sent, it was before the "start" had completed (which takes 1s). So stopRunningServices thought there were "No services to stop." as they were in "starting" state, and didn't try to send SIGTERM to the children. Arguably stopRunningServices should include the services in "starting" state. I might open a separate issue on that.

In the meantime, it's probably best if we wait till the "startup: enabled" services are started when the daemon starts up, and the simplest way to do this now is to wait for the "Started default services ..." log. Note that the full log looks like this.

2024-09-22T22:10:40.147Z [pebble] Started default services with change 1.

I recommend the following diff, which cleans up nicely in my tests:

diff --git a/tests/run_test.go b/tests/run_test.go
index c73c365..c98c693 100644
--- a/tests/run_test.go
+++ b/tests/run_test.go
@@ -49,7 +49,8 @@ services:
 
        createLayer(t, pebbleDir, "001-simple-layer.yaml", layerYAML)
 
-       _, _ = pebbleRun(t, pebbleDir)
+       _, stderrCh := pebbleRun(t, pebbleDir)
+       waitForLog(t, stderrCh, "pebble", "Started default services", 3*time.Second)
 
        waitForFile(t, filepath.Join(pebbleDir, "svc1"), 3*time.Second)
        waitForFile(t, filepath.Join(pebbleDir, "svc2"), 3*time.Second)
@@ -141,6 +142,7 @@ services:
        stdoutCh, stderrCh := pebbleRun(t, pebbleDir, "--verbose")
        waitForLog(t, stderrCh, "pebble", "Started daemon", 3*time.Second)
        waitForLog(t, stdoutCh, "svc1", "hello world", 3*time.Second)
+       waitForLog(t, stderrCh, "pebble", "Started default services", 3*time.Second)
 }
 
 // TestArgs tests that Pebble provides additional arguments to a service
@@ -166,6 +168,7 @@ services:
        )
        waitForLog(t, stderrCh, "pebble", "Started daemon", 3*time.Second)
        waitForLog(t, stdoutCh, "svc1", "hello world", 3*time.Second)
+       waitForLog(t, stderrCh, "pebble", "Started default services", 3*time.Second)
 }
 
 // TestIdentities tests that Pebble seeds identities from a file

@benhoyt
Copy link
Contributor

benhoyt commented Sep 22, 2024

I've opened #502 to track having Pebble terminate services in "starting" state as well.

@IronCore864
Copy link
Contributor Author

While testing the leaked "sleep" issue, I found another place that leaks a sleep. Since it's not related to this feature, I will merge this PR and continue debugging it in another branch.

@IronCore864 IronCore864 merged commit 0ca17af into canonical:master Sep 23, 2024
18 checks passed
@IronCore864 IronCore864 deleted the integration-test-poc branch September 23, 2024 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants